Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Data - CSV Upload UI #6845

Merged
merged 23 commits into from
Jun 1, 2016
Merged

Conversation

Bargs
Copy link
Contributor

@Bargs Bargs commented Apr 9, 2016

Fixes #6541
Depends on #6844 and #6842

This PR implements the CSV Upload Add Data Wizard. Some of the highlights of the changes:

  • A new upload-wizard directive which acts as the container for the CSV wizard. This intentionally duplicates some code and structure from the Tail a File wizard to avoid prematurely DRYing things up before we really know exactly what a wizard API needs to look like.
  • A new parse-csv-step directive which allows a user to pick and preview their CSV
  • A new upload-data-step directive for uploading the CSV to the Kibana _data API and displaying the results to the user
  • A new uploadCSV method on the ingest service, for sending File objects to the Kibana _data API.

The CSV wizard does not currently include a pipeline creation step. It will be re-added once we have editable pipelines implemented.

@rashidkpc
Copy link
Contributor

jenkins, test it

@rashidkpc
Copy link
Contributor

@tsullivan can you look at the test failure here? It seems to be related to the uuid code

@rashidkpc rashidkpc assigned tsullivan and unassigned rashidkpc Apr 14, 2016
@tsullivan
Copy link
Member

Looking into it. I get the failure locally too

@tsullivan
Copy link
Member

@rashidkpc looks like all the tests that use this mechanism cause a failure:

    kbnServer = kbnTestServer.createServer({
      plugins: {
        scanDirs: [
          fromRoot('src/plugins')
        ]
      }
    });

The server test suite only passes if I skip all these:

  • src/plugins/elasticsearch/lib/__tests__/manage_uuid.js
  • src/plugins/elasticsearch/lib/__tests__/routes.js
  • src/server/http/__tests__/index.js

My theory is that the scanDirs/fromRoot setup is calls some constructor function without the new keyword, which causes context to be bound to the global context. But I'm still not really sure. Still looking.

@tsullivan
Copy link
Member

tsullivan commented Apr 15, 2016

@Bargs looks like the highland module is purposefully creating a global variable called nil and that's why mocha is failing with Error: global leak detected: nil

Source: https://github.com/caolan/highland/blob/master/dist/highland.js#L177

Looks like there is a way to register globals via config with grunt-simple-mocha: https://github.com/yaymukund/grunt-simple-mocha

@tsullivan
Copy link
Member

Has conflicts that need to be resolved

@tsullivan tsullivan assigned Bargs and unassigned tsullivan Apr 25, 2016
@Bargs
Copy link
Contributor Author

Bargs commented Apr 27, 2016

Thanks @tsullivan, adding nil to the list of globals fixed the issue.

@Bargs Bargs force-pushed the ingest/uploadUI branch 2 times, most recently from 3588918 to cad25fc Compare April 29, 2016 16:14
@Bargs Bargs removed the review label Apr 29, 2016
@Bargs Bargs force-pushed the ingest/uploadUI branch from cad25fc to 868bee4 Compare April 29, 2016 22:07
@Bargs Bargs force-pushed the ingest/uploadUI branch 4 times, most recently from 5643e53 to e9a5ebe Compare May 12, 2016 16:23
@Bargs
Copy link
Contributor Author

Bargs commented May 12, 2016

jenkins, test it

@Bargs Bargs force-pushed the ingest/uploadUI branch from e9a5ebe to bed8725 Compare May 12, 2016 21:54
@Bargs Bargs force-pushed the ingest/uploadUI branch from bed8725 to ed5e4e3 Compare May 12, 2016 22:16
@@ -28,7 +28,7 @@ modules.get('apps/settings')
_.forEach(res.data, (response) => {
this.created += response.created;
this.formattedErrors = this.formattedErrors.concat(_.map(_.get(response, 'errors.index'), (doc) => {
return `${doc._id.split('-', 1)[0].replace('L', 'Line ').trim()}: ${doc.error.type} - ${doc.error.reason}`;
return `Line ${doc._id.substr(doc._id.lastIndexOf(':') + 1)}: ${doc.error.type} - ${doc.error.reason}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth pulling this logic into a function? Especially since the format of the data has changed now during the development of this feature?

@BigFunger
Copy link
Contributor

One more small suggested change, not a deal breaker. Once tests pass, LGTM.

@BigFunger BigFunger assigned Bargs and unassigned BigFunger May 25, 2016
@Bargs
Copy link
Contributor Author

Bargs commented May 25, 2016

jenkins, test it

@Bargs Bargs assigned BigFunger and unassigned Bargs May 25, 2016
@Bargs
Copy link
Contributor Author

Bargs commented May 25, 2016

@BigFunger

  • Extracted error formatting into its own function
  • Added an API test for ID format

@BigFunger
Copy link
Contributor

LGTM
slack-imgs

@BigFunger BigFunger assigned Bargs and unassigned BigFunger May 26, 2016
@BigFunger
Copy link
Contributor

Still LGTM. Thanks for addressing the duplicate column issue.

@Bargs Bargs assigned w33ble and unassigned Bargs May 27, 2016
@w33ble
Copy link
Contributor

w33ble commented May 27, 2016

I have a CSV file with 2,600+ columns and 68,000+ rows. Importing that seems to blow up the parser or something.

screenshot 2016-05-27 13 47 09

screenshot 2016-05-27 13 53 23

Also, apparently Elasticsearch doesn't allow that many fields on a document:

screenshot 2016-05-27 14 03 45

Pretty edge-case, I know, just wanted to share.

@w33ble
Copy link
Contributor

w33ble commented May 27, 2016

@Bargs since you seemed to think that the large file importing was caused by misuse of papaparse, I'm sending back to you to fix that.

OTT, LGTM!

@w33ble w33ble assigned Bargs and unassigned w33ble May 27, 2016
Bargs added 2 commits May 31, 2016 19:40
Previously I was using PapaParse's preview option, but it turns
out that does not prevent the library from loading the entire file
if you don't use one of the streaming callbacks. So now we have to
do some extra gymnastics to gather the data per row in the step
callback, manually abort parsing if we haven't hit the end of the file
after a given number of sample lines and then digest the data in the
complete callback.

I also added an extra condition to a watcher that caused parsing to
happen twice.
@Bargs
Copy link
Contributor Author

Bargs commented Jun 1, 2016

@w33ble just pushed some performance improvements. Really glad you caught this issue. After profiling the code I realized there were a combination of things dragging performance down. Give it another whirl and tell me what you think. I was able to add your 700MB CSV to the first wizard step in about 8 seconds without the browser choking. As far as I can tell, the bottleneck is Angular now. With all those columns, it's having to create a lot of DOM elements. If you have any suggestions on optimizing that, I'd be all ears!

…ance degrades if the user has hundreds or thousands
@w33ble
Copy link
Contributor

w33ble commented Jun 1, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Add Data Add Data and sample data feature on Home review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants